Software Safety : Why is there no Consensus ?
نویسندگان
چکیده
Development and assessment of safety critical software is governed by many standards. Given the growing dependence on software in a range of industries, one might expect to see these standards reflecting a growing maturity in processes for development and assessment of safety critical software, and an international consensus on best practise. In reality, whilst there are commonalities in the standards, there are also major variations in the standards among sectors and countries. There are even greater variations in industrial practices. This leads us to consider why the variation exists and if any steps can be taken to obtain greater consensus. In this paper we start by clarifying the role of software in system safety, and briefly review the cost and effectiveness of current software development and assurance processes. We then investigate why there is such divergence in standards and practices, and consider the implications of this lack of commonality. We present some comparisons with other technologies to look for relevant insights. We then suggest some principles on which it might be possible to develop a cross-sector and international consensus. Our aim is to stimulate debate, not to present a “definitive” approach to achieving safety of systems including software. The meaning of software safety Software is becoming an increasingly important element of many safety-critical and safetyrelated systems. In many cases, software is the major determinant of system capability, e.g., in aircraft flight control systems, body electronics for cars, air traffic control, pacemakers and nuclear power plant control. For such systems, the software is often a major factor in the costs and risks of achieving and assuring safety. Costs can be of the order of $10M for each system on a modern aircraft. Some people argue that the term “software safety” is a misnomer as software, in and of itself, is not hazardous, i.e., it is not toxic, does not have high kinetic energy, and so on. Here we use the term “software safety” simply as a shorthand for “the contribution of software to safety in its system context”. Software can contribute to hazards through inappropriate control of a system particularly where it has full authority over some hazardous action. By “full authority” we mean that no other system or person can over-ride the software. Software can also contribute to hazards when it’s behaviour misleads system operators, and the operators thereby take inappropriate actions. Misleading operators is most critical if they have no means of cross-checking presented information, or have learnt to trust the software even when there are discrepancies with other sources of data. Described this way, software is much like any other technology, except that it can only contribute to unsafe conditions due to systematic causes, i.e., there is no “random” failure or “wear-out” mechanism for software. Systematic failures arise from flaws or limitations in the software requirements, in the design, or in implementation. Thus, software safety involves the consideration of how we can eliminate such flaws and how we can know whether or not we have eliminated said flaws. We refer to these two concerns as achieving and assuring safety. Why is there a concern? There is considerable debate on software safety in industry, academia, and government circles. This debate may seem slightly surprising, as software has a remarkably good track record. There have been several high-profile accidents, e.g., Ariane 5 ([1]) and Therac 25 ([2]), and in aerospace the Cali accident has been attributed to software (more strictly data ([3])), but a study of over 1,000 apparently “computer related” deaths ([4]) found that only 34 could be attributed to software issues. The critical failure rate of software in aerospace appears to be around 10 per hour ([5]), which is sufficient for it to have full authority over a hazardous/severe major event and still meet certification targets. In fact, most aircraft accidents stem from mechanical or electrical causes, so why is there a concern? We believe the concern arises out of four related factors. First, there is some scepticism related to the accident data. It is widely believed that many accidents put down to human error were actually the result of operators (e.g., pilots) being misled by the software. Also, software failures typically leave no trace, and so may not be seen as contributory causes to accidents (e.g., the controversy over the Chinook crash on the Mull of Kintyre ([6])). Further, much commercial software is unreliable, leading to a general distrust of software. Many critical systems have a long history of “nuisance” failures, which suggests that more problems would arise if software had greater authority. Second, systems and software are growing in complexity and authority at an unprecedented rate, and there is little confidence that current techniques for analysing and testing software will “keep up.” There have already been instances of projects where “cutting edge” design proposals have had to be rejected or scaled down because of the lack of suitable techniques for assuring the safety of the product. Third, we do not know how to measure software safety; thus it is hard to manage projects to know what are the best and most effective techniques to apply, or when “enough has been done”. Fourth, safety critical software is perceived to cost too much. This is both in relation to commercial software, and in relation to the cost of the rest of the system. In modern applications, e.g., car or aircraft systems, it may represent the majority of the development costs. These issues are inter-related; that is, costs will rise as software complexity increases. The cost and effectiveness of current practises It is difficult to obtain accurate data for the cost and effectiveness of software development and assessment processes as such information is sensitive. The data in the following sections are based on figures from a range of critical application projects primarily from aerospace projects based in Europe and the USA; however we are not at liberty to quote specific sources. Costs: Costs from software requirements to the end of unit testing are the most readily comparable between projects. Typically 1-5 lines of code (LoC) are produced per man day, with more recent projects being near the higher figure. Salary costs vary, but we calculate a mid-point of around $150 to $250 per LoC, or $25M for a system containing 100 kLoC of code. Typically, testing is the primary means of gaining assurance. Although the costs of tests vary enormously, e.g. with hardware design, testing frequently consumes more than half the development and assessment budget. Also, in many projects, change traffic is high. We know of projects where, in effect, the whole of the software is built to certification standards three times. In general, the rework is due to late discovery of requirements or design flaws. Flaws and failure rates: From a safety perspective, we are concerned about the rate of occurrence of hazardous events or accidents during the life of the system. As indicated above, aerospace systems seem to achieve around 10 failures per hour. We are also aware of systems which have over 10 hours of hazard free operation, although there have been losses of availability. However, there is no practical way of measuring such figures prior to putting the software into service ([7]) and, in practice, the best that can be measured pre-operationally is about 10 or 10 failures per hour. As an alternative to evaluating failure rates, we can try to measure the flaws in programs. We define a flaw as a deviation from intent. We are not concerned about all flaw types, so it is usual to categorise them, e.g., safety critical (sufficient to cause a hazard), safety related (can only cause a hazard with another failure), and so on. On this basis, so far as we can obtain data, anything less than 1 flaw per kLoC is world class. The best we have encountered is 0.1 per kLoC for Shuttle code. Some observations are in order. First, these figures are for known flaws; by definition, we do not know how many unknown flaws there are. We might expect all known flaws to be removed; however, removing flaws is an error prone process, and thus there comes a point where the risks of further change outweigh the benefits. Second, it is unclear how to relate flaw density to failure rate. There is evidence of a fairly strong correlation for some systems ([8]). In general, however, the correlation will depend on where the flaws are in the program. A system with a fairly high flaw density may have a low failure rate and vice versa depending on the distribution of flaws and demands (inputs). Third, commercial software has much higher flaw densities: perhaps 30-100 per kLoC; as much as two orders of magnitude higher! We can also analyse where flaws are introduced and where they are removed to try to assess the effectiveness of processes. The available data suggest that more than 70% of the flaws found after unit testing are requirements errors. We have heard figures as high as 85% ([9]). Late discovery of requirements errors is a major source of change, and of cost. Cost/effectiveness conclusions: The available data are insufficient to answer the most important questions, namely which approaches are most effective in terms of achieving safety (e.g., fewest safety-related flaws in the software) or most cost-effective. There are too many other factors (size of system, complexity, change density etc.) to make meaningful comparisons among the few data points available. It is not even possible to provide an objective answer to the question “does safety critical software cost too much?” as software is typically used to implement functionality that would be infeasible in other technologies. Thus there are few cases in which we can directly compare the costs of achieving a given level of safety in software with the costs of achieving an equivalent level using other technologies.
منابع مشابه
The Efficacy of DO-178B
DO-178B was based on the consensus of the avionic software community as it existed in 1992. Twenty two years after publication, we have no publically available experimental data as to its efficacy. It appears to work extremely well, since there have been no hull loss accidents in passenger service ascribed to software failure. This is a comforting and surprising result. However, if we don’t kno...
متن کاملUser Participation-based Software Certification
Except for a couple of rigorous software certification schemes that are required for certain safety-critical software applications (e.g., RTCA DO178-B), there are no generally trusted software certification schemes. While there is no theoretical reason why these safety-critical standards could not be applied to all applications, there is a practical reason: cost. Estimates vary, but it is gener...
متن کاملA Survey of Social Factors Influencing Social Consensus(Case Study: Bushehr Civic Families)
The aim of this research is to study social factors influencing on social consensus. Sampling method was multi-process and included cluster and multistage sampling and sample size based on Cochran's Formula was 380 persons too. Data collection tools was questionnaire. In this research, the methods of data analysis were independent T-Test, Spearman Correlation Coefficient, Multivariate Regressio...
متن کاملSafety Indicators for the Safety Assessment of Radioactive Waste Disposal
FOREWORD Plans for disposing of radioactive waste have raised a number of unique and mostly philosophical problems, mainly due to the very long timescales which have to be considered. While there is general agreement on disposal concepts and on the approach to establishing that disposal facilities are safe, consensus on a number of issues remains to be achieved. To assist in promoting discussio...
متن کاملرابطه بین نگرش به ایمنی با رفتارهای مخاطره آمیز در رانندگی در رانندگان تاکسی داخل شهر ارومیه
Background: Every year thousands of people lose their lives in car accidents, in such a manner that Iran is one of the top rank countries in this case. One of the effective factors in this field is attitude of drivers toward safety in driving. The present study aimed to investigate the correlation between safety to attitude and precarious behaviors.Materials and Methods: The present stud...
متن کامل